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Abstract 



Systematic inaccuracy is inherent in any computational estimate of a non- 
linear average, such as the free energy difference AF between two states or 
systems, because of the availability of only a finite number of data values, N. 
In previous work, we outlined the fundamental statistical description of this 
"finite-sampling error." We now give a more complete presentation of (i) rig- 
orous general bounds on the free energy and other nonlinear averages, which 
generalize Jensen's inequality; (ii) asymptotic N — > oo expansions of the av- 
erage behavior of the finite-sampling error in AF estimates; (iii) illustrative 
examples of large- iV behavior, both in free-energy and other calculations; and 
(iv) the universal, large-iV relation between the average finite-sampling er- 
ror and the fluctuation in the error. An explicit role is played by Levy and 
Gaussian limiting distributions. 



I. INTRODUCTION 



Because of the substantial recent interest in free energy difference AF calculations (e.g., 
[1-14]), this report discusses the unavoidable error that arises from use of a finite amount of 
computer time. There is a tremendous range of applications for computational AF estimates 
in physical, chemical, and biological systems. Examples include computations relating crys- 
talline lattices [4,8], the behavior of magnetic models [8,15], and biomolecular binding events 
— of ligands to both DNA and proteins (e.g., [16-19,1]). Computations of AF, moreover, are 
formally equivalent to calculating the temperature dependence F(T) [8]. Most recently, it 
has been pointed out that AF calculations are required to convert experimental data from 
nonequilibrium single-molecule pulling measurements to free energy vs. extension profiles 
[9,12]; see also [11]. From a methodological standpoint, free energy computation protocols 
have been the subject of long and sustained interest [20-22,16,23-34,5,35-37,6,7,38,39] 

Free energy computations have long been recognized to suffer from a number of sources 
of error: (i) inaccuracy of the model (Hamiltonian) itself, (ii) incomplete conformational 
sampling, and (iii) finite sample size. In biomolecular systems, the issue of model accuracy 
(i) is indeed important, as typical all- atom force-fields are generally assumed to be capable 
of no more than 1-2 ksT accuracy in estimates of free energies — e.g., of ligand bind- 
ing. That is, even with perfect sampling, computational estimates typically will not match 
experimental values. Second, like every simulation technique, free energy calculations are 
subject to errors based on (ii) incomplete conformational sampling. "Incomplete" sampling 
here refers to the lack of access to important parts of phase or conformational space: that 
is, the distribution of samples of size N generated by the simulation will not match the 
true (representative) distribution of size-iV samples which would be drawn at random from 
a very long, perfectly-sampled simulation. Incomplete conformational sampling introduces 
bias into even simple computations attempting to estimate linear quantities, such as the 
mean of some coordinate or function. Conformational sampling errors in free energy cal- 
culations have long been recognized as "hysteresis" or "Hamiltonian-lag," and a number 
of workers have made important contributions toward understanding and overcoming these 
errors — e.g., [26,27,30]. 

We focus here on the third type of error (iii) that due solely to the necessarily finite 
samples collected in a simulation. Such finite-sampling bias occurs in every non-linear calcu- 
lation, as detailed below, and should be clearly distinguished from the independent error due 
to (ii) inadequate conformational sampling. Specifically, finite-sampling error occurs even 
when conformational sampling is perfect — i.e., when representative samples of finite size 
are generated by the simulation. Finite-sampling errors in computational estimates of AF 
were first recognized by Wood and coworkers [40] and later discussed by others [3,6,7,13,14]; 
see also [41-43]. The in-depth work of Lu and Kofke presents an entropy-based descrip- 
tion of finite-sampling errors, which emphasizes the critical asymmetry between generalized 
"insertion" and "deletion" calculations [6,7]. 

Figure 1 illustrates the phenomenon of finite-sampling error for a mathematical model 
and for a biological system [44], emphasizing the universality of finite-sampling errors. Be- 
cause these inaccuracies can be many times /c#T (see Fig. 1 and Ref. [13]) — especially 
in the important context of biomolecular calculations where large system sizes limit the 
quantity of data available for analysis — there is a strong motivation to understand and 



overcome these errors. Ferrenberg, Landau and Binder showed that finite-sampling errors 
accompanying susceptibility computations can be understood on the basis of elementary 
statistical principles [42]; however, the errors in non-linear averages like AF apparently had 
remained without an explicit theoretical basis until recently [6,7,14]. In a recent monograph, 
in fact, Landau and Binder note that finite-sampling errors are "generally given inadequate 
attention" [43]. 

This report both provides fuller details of the theory outlined in [14], and also presents 
new results. Our report includes (i) a detailed proof that the expected value of a finite- 
data AF estimate (AF ra ) bounds the true free energy — independent of the distribution of 
underlying work values; (ii) full derivations of the asymptotic expressions for AF n as n — > oo 
for arbitrary — including long-tailed — distributions of the work {W) data used to estimate 
AF; (iii) analogous derivations for the root-mean-square and related "geometric" non-linear 
averages; (iv) derivation and numerical demonstration of the universal asymptotic relation 
between AF n and its fluctuation. As in our brief report [14], the present discussion makes 
use of mathematical results regarding the convergence — to "stable" limiting distributions 
[45-47], also known as Levy processes (e.g., [48]) — of the distributions of sums of variables. 

In outline, the paper now proceeds to Sec. II where formal groundwork for the discussion 
is laid. Section III rigorously proves the true free energy AF is bounded by AF n , the 
expected value of a free-energy estimate based on a finite quantity of data; analogous bounds 
apply for arbitrary non-linear averages. Derivations of the asymptotic series for AF n are 
given in sections IV and V, while Section VI derives the universal relation between AF n and 
its fluctuation. We conclude with a summary and discussion of the results in Section VII. 

II. FREE-ENERGY ESTIMATES FROM FINITE SAMPLES 

Since the work of Kirkwood [49], it has been appreciated that the free energy difference, 
AF = AFo^i, of switching from a Hamiltonian H$ to Hi is given by a non-linear average, 

AF = —ksT log [ (exp (-W ^/k B T) ) ] , (2.1) 

where ksT is the thermal unit of energy at temperature T and Wo^i is the work required to 
switch the system from H to Hi; see below. The angled brackets indicate an average over 
switches starting from configurations drawn from the equilibrium distribution governed by 
H - 

While non-equilibrium approaches to free energy calculations have been a major mo- 
tivation for this work, we should point out that our analysis applies equally to "staged" 
calculations, in which the free energy is calculated as a sum of increments. In particular, if 
one writes the free energy as a sum of incremental parts, 

AF ^i = AF ^ Al + AF Al ^ A2 + • • • + AF Afc ^ , (2.2) 

then each increment AF A .^ A . is still defined by a non-linear average analogous to (2.1) and 
thus will suffer from finite-sampling error. 

The work required for the average (2.1) can be defined in a straightforward manner. 
In instantaneous switching the work is defined by Wq^i = 7ii(x)— 7io(x) for a start (and end) 
configuration x. However, gradual switches requiring a "trajectory" -based work definition 



may also be used, as was demonstrated by Jarzynski [2,3]. In this latter case, one requires 
a Hamiltonian which interpolates between Ho and Hi] a common choice is 

H(X; x) = fto(x) + A [^(x) - W (x)] , (2.3) 

where A is a switching parameter that varies from zero to one. The work performed in 
switching gradually from Ho to Hi along a trajectory (A(t);x(t)) is given by 

W ^i = ]T [H{\; Xi_!) - W(Ai_ i; Xi-O] , (2.4) 

i 

where the subscripted configuration Xj_i is the (unique) final configuration for which A = 
Aj_i — i.e., the last configuration before A is incremented to \. In other words, the work is 
computed as the sum of those energy increments resulting only from changes in A. 

Whenever a convex, nonlinear average such as (2.1) is estimated computationally, that 
result will always be systematically biased [41] because one has only a finite amount of 
data — say, N work values. The bias results from incomplete sampling of the smallest (or 
most negative) Wo-,i values [6,7]: these values dominate the average (2.1) and cannot be 
sampled perfectly for finite N, regardless of the Wo->i distribution. This is true even for a 
rectangular distribution; the sole exception is the trivial S function, single-point probability 
density. Because of the undersampling of small work values, a running estimate of AF will 
typically decline as data is gathered, as one sees in the "staircase" plots of Fig. 1. Such 
considerations led Wood et al. [40] to consider the block-averaged n-data-point estimate of 
the free energy based on N = mn total work values {W^ fc )}, namely, 
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It represents the expected value (mean) of a free energy estimate from n data points 
is, of 



F n = -k B T log (. 
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(2.5) 



that 



(2.6) 



where m estimates have been made. Wood et al. estimated the lowest order correction 
to AF = AFqo as a% J /2nkBT, where is the variance in the distribution of work val- 
ues, W [40]. Ferrenberg, Landau and Binder discussed analogous issues for the magnetic 
susceptibility [42,43]. 

The derivations below employ continuum expressions simplified by the definitions 

w = W/k B T, f = AF/k B T, f n = AF n / k B T . (2.7) 

In terms of the probability density p w of work values, which is normalized by / dwp w (w) = 1, 
the free energy is given by the continuum analog of (2.1), 



/ = AF/k B T = - lo£ 



dw p w (w) e 



(2.8) 



The form (2.8) also occurs in equilibrium calculations [50], and forms the basis for the 
analysis of Lu and Kofke [6,7]. Finally, the finite-data average free energy, following (2.5) 
must apply the logarithm "before" the average of the n Boltzmann factors, and one has [14] 

/»» = ~ / n [ dw ip w ( w i)} !°g - 



(2.9) 



III. GENERALIZED JENSEN'S INEQUALITIES: BLOCK- AVERAGED 
ESTIMATES AS RIGOROUS BOUNDS 



Jarzynski [3] and subsequently the present authors [13] observed that the free energy is 
bounded according to 

AF < AF n , anyn. (3.1) 

Here we prove this inequality and a generalization originally stated in [13], namely, 

AF n+1 < AF n , anyn. (3.2) 

Indeed the proof given below applies to a broad class of nonlinear averages and functions. As 
noted in [14], the result (3.1) substantially extends the previous bound AF < (W) = AF± 
[15]. Further, the bounds apply for an arbitrary distribution of work values — that is, 
whether the probability density of W is multimodal, unimodal, or even rectangular. We 
note, finally, that reference [14] by the present authors failed to acknowledge the original 
statement of the bound (3.1) by Jarzynski in [3]. 

In fact, our proof will show that (3.1) and (3.2) are special cases of a more general 
inequality that depends solely on the convexity and monotonicity of the function used to 
form a nonlinear average: in the case of AF the function is the exponential — see (2.1); the 
root-mean-square is another example, when the function is g(x) = x 2 . In the remainder of 
this section, we use the mathematical convention that upper-case letters (e.g., X) indicate 
random variables whose particular values are specified by lower-case letters (e.g., x). 

The new bounds are generalizations of Jensen's inequality (see [51]), a fundamental 
property of convex functions with a host of applications including in information theory 
[52]. Jensen's inequality relates the expected value of a convex function g of a random 
variable to the same function of the expected value of its argument according to 

(g(X)) > g((X)) , (3.3) 

where the expectation value is defined in the usual way for an arbitrary function A as 

(A) = (A(X))= Jdx p{x) A{x) , (3.4) 

and p is the probability density function. By applying g~ l to (3.3), the inequality can be 
re-stated in terms of non-linear and linear averages, respectively, 

(X)° = g' 1 ((g(X))) > (X) , (3.5) 

with the additional constraint that g be increasing (so that g^ 1 is unique). Note that the 
inequality (3.5) can easily be generalized by applying the inverse of a different increasing 
function (say, h~ l ) to (3.3). 

We now state and prove the new "generalized Jensen's inequalities." 
Theorem: 

Consider estimates for the non-linear average (X) 9 based on samples of size n, 
{x±, X2, ■ ■ ■ , x n }, the expectation of which may be written as 



( x Yn = jdx 1 p(x 1 ) ■ ■ ■ Jdx n p{x n ) g 1 ([g(x 1 )-\ g(x n )]/n). (3.6) 

Note that {X){ = (X) and (X) 9 ^ = (X) 9 . Then the new inequalities, generalizing (3.5), are 

(X) 9 n > POn-i ■ (3.7) 

Strict inequality obtains whenever the random variable X is not limited to a single value 
(i.e., whenever the probability density p is not a single Dirac delta function). The direction 
of the inequality is reversed for decreasing convex functions, for instance yielding (3.1) for 
g(x) = exp (-x/k B T). 

Proof: 

Note first that (X) 9 n is defined in (3.6) as the non-linear average based on the "weighted 
set" S n of all possible n-samples {xi, . . . ,x n }. The weight of each n-sample is of course its 
probability density p n ({xi}) = ri7=i p( x i)- We will require an explicit construction of the 
set S n -i from S n , which fortunately is straightforward: for every n-sample with weight p n 
in S n , if one assigns equal weights p n /n to each of the n available (n — l)-samples given by 
deletion of a single element — namely, {x2, x^, . . . , x n }, {xi, X3, X4, . . . , x n }, and so on - 
one arrives at S n -i. The correctness of this construction follows from the factorizability of 
the density p n , and may be seen by considering the density of a particular {n — l)-sample, 
{x} = {x±, . . . ,x n -i}, which can be constructed from n different deletions: 



Pn-i({x}) = - 



1 

n 



H h J dx 

n-1 

dx p{x) Y\_ p( x i 



i=l 



n-1 

n 

i=l 



(3.8) 



Because of this construction of S n -i from S n , it is sufficient to show that the single- sample 
non-linear average of an arbitrary n-sample, namely, 



u 



n({xi}) = g 1 \ 



n 



(3.9) 



exceeds the average u n -\ based on the n available (n— l)-samples constructed from deletions, 
as above. Note that 



(«n({*i})>„ = WS, 



(3.10) 



which follows immediately from (3.6). 

To complete the proof, observe that the single-sample non-linear average (3.9) can be 
re-written in terms of smaller samples: 



Un({Xi}) 



1 1 



(3.11) 



This identity may be illustrated by considering g(xi) which occurs n — 1 times (whenever 
j 7^ 1), and hence is properly weighted as in (3.9). The expression may be further re-written 
if we denote by the original n-sample with the j'th element deleted. To each of 

these smaller samples, there corresponds a single-sample, non-linear average u n -i({xi}y]). 
Applying g to both sides of (3.9) and substituting the result for n — \ into the right-hand-side 
of (3.11), we then have 



u, 



,({x i })=9- 1 ^E»[«n-ite}[i])]j • (3-12) 



If we now consider U n -\ <-> u n _\ to be a random variable with a discrete, n-point distribution, 
we can apply the original non-linear-average inequality (3.5) to the right-hand side of (3.12), 
and obtain the desired result 



U n ({Xi}) > (Un-l)\j] , (3.13) 

where the average (• • is performed over the discrete distribution comprised of all u n _i 
values obtained from applying (3.9) to the n sets {xjjyj. This completes the proof because 
when the left-hand-side is averaged as in (3.10), the construction of S n -i from S n guarantees 
that the average over all n-samples on the right-hand side of (3.13) results in (X)^-i and 
hence (3.7). 

The result applies to any probability density p because no assumptions were made re- 
garding the distribution. Figure 1 illustrates the monotonicity of finite-data free energy 
estimates from two completely unrelated systems. 



IV. ASYMPTOTIC BEHAVIOR: FINITE MOMENTS CASE 

A. Formal Development of the Expansion 

It is possible to generate a formal expansion for the finite-data estimate f n in terms of n" 1 
for an arbitrary distribution of work values p w . In this section we consider the case where 
the second and some higher moments of the z = e~ w distribution are finite. Motivated 
by the central and related limit theorems [53,45,47] for the sum of the e~ w variables, we 
introduce a change of variables which will permit the development of a \jn expansion for 
/„. In particular, we define 

y = (e"^ + • • • + e~ Wn - ne~ f ) / b in 1/a , (4.1) 

where b\ is a constant and a < 2 is an exponent characterizing the distribution of the variable 
e~ w . The requirement that AF be finite in (2.8) further implies a > 1. The finite-data free 
energy difference can now be written 

U = ~ r dyp n (y) \og(e- f + ^y] , (4.2) 

J-cn a \ n a ) 

where c = exp (— a = (a — l)/a < 1/2, and p n is the probability density of the variable 
y normalized appropriately via 



/oo 
dyp n (y) = l. (4.3) 
-cn a 

Note that a is always positive because a > 1. 

The expansion of /„ proceeds by first noting that the sum of any set of independent 
random variables, suitably normalized as in (4.1), has a distribution which may be expressed 
as a stable (Levy) distribution function multiplied by a large-n asymptotic expansion [45,54]. 
In the case of a Gaussian limiting distribution (i.e., a = 2 or the central limit theorem), 
assume the variable z = e~ w possesses finite "Boltzmann moments" - - a mean fi = e~f , 
variance a 2 = {{z — p) 2 ), and higher central moments p p = ((z — p) p ). The normalizing 
constant in (4.1) is then b\ = a. The Boltzmann moments of course differ from the moments 
of the distribution of w. 

The so-called Edgeworth corrections to the central limit theorem indicate that the vari- 
able y = (Yh=i e~ Wi — np)/ ' \fna [cf. (4.1)] is distributed according to [55,56] 

p n (y) = p G (y,l) [l + iy 1 (y)/V^+u 2 (y)/n + --] , (4.4) 

for large n, where the remaining terms are higher integer powers of l/\/n and the Gaussian 
density is 

p G {y- a) = exp (-y 2 /2a 2 )/V2^a . (4.5) 

The functions v^, which are defined based upon the Hermite polynomials [55,45], depend 
on the original distribution of e~ w . In terms of the cumulants ki (see, e.g., [55]) of the 
distribution of z — e~ w and the Hermite polynomials defined via 

' p G (x;l) = (-l) k H k (x)p G (x;l), (4.6) 



dx k 

the lowest-order Edgeworth functions are [55,56] 

= (k 3 /6a 3 )H 3 (y) = (p 3 /6a 3 ) (y 3 - 3y) (4.7) 

My) = (k,/24a 4 )H 4 (y) + (k 2 3 / '72a 6 ) H 6 (y) (4.8) 
u 3 (y) = (k 5 /120a 5 )H 5 (y) + (k 3 k,/lUa 7 )H 7 (y) + (k 3 3 / 1296a 9 ) H 9 (y) . (4.9) 

The Vi functions are odd or even according to whether % is odd or even, in this a = 2 case. 

Before the expansion for f n can be developed, the integral (4.2) must be considered 
carefully by dividing it into three parts: 

- f , h x 



/oo 
dyPn(?/)logl e~ J + —y 
-cn a \ n a J 

/oo . . 

dyp n (y) log(e _/ ) 
_ cn a V / 

/+cn a roo 
dy Pn (y)\og(l + y/cn a )+ dy p n (y) log(l + y/cn a ) (4.10) 
-cn a J +cn a 

= -f + I(-cn a , erf) + I(cn a , oo) , (4.11) 



where the first integral in (4.10) has been evaluated exactly using the normalization of p n 
(4.3) and / represents the latter integrals of (4.10). One can now proceed by using an 
expansion for the logarithm in I(—cn a , cn a ) and by bounding terms in I(cn a , oo). 

It is possible to demonstrate rigorously that the second integral in (4.10), I(cn a , oo), 
does not materially contribute to f n — f for large n. Although, the logarithm cannot be 
expanded in a power series for y > cn a , the integral can be bounded by expressing the log 
as the integral of its derivative: 



I(cn a , oo) 



+cn a 
oo 



l+y/cn a 



dxx~ 



roo rl+y/cn a j /■oo 

< dypn(y) dxx~ 1+t = - dyp n (y) 

J+cn a J I 6 J+cn a 



1 

e J+cn a 



1 + 



cn a 



- 1 



(4.12) 
(4.13) 



with < e < 1. To extract the leading behavior of this bound, one can use the expansion 
of p n (4.4) and set e = 1. Noting that a = 1/2, one obtains 



1 f°° r 
I(cn a , oo) < ^ / dyp G (y;l) 1 + vi{y)/Vn + u 2 {y)/n + 

C\/nJ+cn a 1 



(4.14) 



Using the asymptotic properties of the error function [55], one can show that the strongest 
n dependence of I(cn a , oo) is no stronger than 



nexp (— c 2 n/2) . 



(4.15) 



The leading behavior of f n — f is thus expected to result from the first non-trivial integral 
in (4.10), I(—cn a ,cn a ). Noting again that a = 1/2 in this case, we may write 



/+Csfri 
dyp G {y, 1) 
-c-Jn 



l + v 1 (y)/ y /E+ v 2 {y)/n + ^(y)/n 3/2 + 
y/c^i - {y/cV^) 2 /2 + (y/c^f/Z - 



(4.16) 



What are the leading terms? There are no terms proportional to rr 1 ! 2 raised to any odd 
power because of symmetry considerations: the Vi functions are even for even %. The lead- 
ing terms are thus integer powers of rT 1 , and the expansion of the finite-data free-energy 
difference is of the form, 



fn = f + Vi/n + (p 2 /n 2 + 



(4.17) 



where the ipi are constants which depend on the distribution of z = e~ w . 

The explicit correction terms to /„ — / may now be obtained. First note that asympo- 
totic analysis of the integrals appearing in (4.16) in terms of the error function [55] indi- 
cates that the limits of integration may be extended to (— oo, +oo) with errors proportional 
to exp (— c 2 n/2). Straightforward integration then yields the coefficients of the expansion 
(4.17), namely, 



y? 2 = -(4/}/t 3 - 9<7 4 )/i2/i 4 . 



(4.18) 
(4.19) 



B. Coefficients for the Gaussian case 



When the distribution of work values is Gaussian, p w (W) = Pg{W,ct w ), the Boltzmann 
moments and, hence the ip coefficients of (4.17), may be computed analytically. Note that 
one cannot assume that z = e~ w obeys a Gaussian distribution because z is always non- 
negative. The moments follow from straightforward integration, which yields 



(z p ) = J m Pw {W)e- pW ' kBT = exp[p 2 a 2 w /2(k B T) 



(4.20) 



The f n expansion coefficients then follow trivially from substitution into (4.18) and (4.19). 
Setting s = a w /k B T, one finds for the first two coefficents 



<Pi = (e s2 -l)/2, 
<p 2 = (-4e 3s2 + 9e z 



6e s 



+ 1)/12. 



(4.21) 
(4.22) 



To compare this with the finding of Wood et al. for f n — /, one can expand (4.21) for small 
a w . One finds <fi ~ al J /2k B T, which is precisely the first-order prediction of Wood et al. 

[40]. ^ ^ ^ 

This analytic calculation explicitly indicates the practical shortcomings of the expansion 
(4.17). Although the leading term in f n — f is linear in 1/n, the leading coefficients are 
exponential in the square of the distribution's width. The asymptotic expression (4.17) thus 
represents a viable approximation only for a very small window about 1/n = when s 1; 
see Fig. 2. When asymmetry is added to a Gaussian distribution via the first Edgeworth 
correction (see (4.4) and, e.g., [45]), one finds that the exponential dependence of the tpi on 
cr w is only corrected linearly by the now non-zero third moment of the W distribution. 



C. Expansions for the root-mean-square and similar averages 

The root-mean-square (or standard deviation) is perhaps the best known example of a 
non-linear average. The full analysis carried out above carries over quite directly, and indeed 
applies to any non-linear average. We will now briefly consider general "root-mean-powers" 
("power means"). 

To be specific, consider the non-linear average resulting from a general power q = 2, 4, . . ., 
denoted 



R (q) = ( x qf/ q ? 



(4.23) 



where x is a variable distributed according to the (arbitrary) probability density p x 
In direct analogy with (2.9) one can define the finite-data average for as 



i#>= [f[[dx lPx (x t )} 

J i=i 



n 



i=i 



(4.24) 



The asymptotic expansion follows from the same procedure as above. One finds that the 
expansion 



R M= R M + ^/ n + ^/ n i + 



Si) 



(4.25) 



has coefficients 



V [ q) = (-l/2q)(l-q- 1 )R^a 2 /^ 

q- l {l-q- l ){2-q- l )R^/~A - (3 - <T V/8 



where fl, a 2 , and /J3 denote the mean, variance, and third central moment 
— of the distribution of the variable x q . 



(4.26) 
(4.27) 

respectively 



V. ASYMPTOTIC BEHAVIOR: DIVERGENT MOMENTS CASE 

When the distribution p z of the variable e~ w = z in (4.1) possesses a long-tail, the limiting 
distribution is not a Gaussian and the results (4.4) and (4.17) no longer hold. In particular, 
if one of the tails of p z (z) decays as z~( l+a "> with a < 2 (implying an infinite Boltzmann 
variance, a 2 ), then the distribution of the variable y in (4.1) approaches a non-Gaussian 
"stable" (Levy) law for large n [47]. Note that such power-law behavior in z corresponds to 
simple exponential decay in the work distribution. 

A long-tailed z distribution p z = p 1 also alters the form of the asymptotic expansion of 
the distribution of the sum- variable (4.1) and, hence, the expansion of /„ — which no longer 
includes solely integer powers of n -1 , as in (4.17). Instead of (4.4), the y distribution now 
takes the more complicated form [54] 

p n {y) = Pa(y) [1 + Y,*»u V (y)/n 9 ^} , (5.1) 

where p a is the appropriate stable probability density with exponent a [45-47]. The functions 
is uv , which are not available analytically, depend on the original distribution of e~ w and 
partial derivatives of the stable distribution. The exponents are given by 9(u, v) = (u + 
av)/a > 0, and the summation J2* includes u > and v > — \u/2], where \x~\ denotes the 
integer part of x. 

To analyze the asymptotic behavior of f n in this case, the starting point is again equations 
(4.1) - (4.3), which are fully general. It is useful to rewrite (4.2) by scaling the logarithm's 
argument by the constant e~? and by subtracting zero in the form of the mean of y; one 
obtains 



f-f n =r dyp B (y)[log(l + -^)--^ 

J-cn a L V cn a J cn a 



I(-cn a , oo). (5.2) 



One can now divide up the domain of integration in (5.2) into sub-parts appropriate for 
expansions of the logarithm of p n , in analogy with (4.10). Because no explicit forms for stable 
distributions are known in the range 1 < a < 2 [47], we will require separate expansions 
of p n for \y\ < 1 and y — > ±oo to obtain appropriate convergent behavior. The required 
breakdown of the integral is therefore 



f-f n = I(-cn a , -1) + /(-l, 1) + 7(1, oo) . 



(5.3) 



Each of the integrals in (5.3) requires a slightly different procedure. The first, 
I(—cn a , — 1), requires an expansion of the logarithm along with the "short-tail" y — > — oo ex- 
pansion of p n ~ p a (see below). The second integral, J(— 1, 1), uses simple convergent series 
expansions of both the logarithm and p a . Finally, 1(1, oo) requires primarily the "long-tail" 
y — > oo expansion of p a ; the series expansion of the logarithm is also used to show that 
extending the lower limit of integration to zero accrues a non-leading correction. 

Because we will extract only the leading term of f n — f, it is sufficient to use only the 
leading contribution to p n ; that is, considering (5.1) we may use the asymptotically valid 
(n — > oo) approximation p n pa p a . (The leading behavior for f n in the finite-moments case 
arises, similarly, from p n ss p G .) The required series expansions for p a in the case of positive 
summands z = e~ w are [45-47] 

p a (y,0 =ET=iC° k \y\ k ~ 1 , \y\>0 (5.4) 
-ET=iC^y- (ka+1) , y^oo (5.5) 

where the "~" sign denotes an asympotic expansion, and the coefficients — which depend 
on the sign of y — are given by 

C° k (0 = ^-ir 1 ^ 1 ^ ^ , (5.6) 

cr = h-ir iT -^^^e), (5.7) 

with e = ay > 0) = a - 1 and = £(y < 0) = 1. Note that C° k (^) = (-l)*" 1 ^"), 
and in particular, C°(^ + ) = C®(£~) = C°. Because the summands considered here are 
strictly positive, the left tail of the distribution does not exhibit power-law behavior; rather, 
it may be termed "short" or "light" and, asymptotically, is given by [47] 



Pa(V -> -OO) 



^J2na(a — 1) 




-(a-1) 



(5.8) 



We can now consider the terms in (5.3) using (5.4) - (5.8). For the sake of brevity we 
quote only the leading terms, which result from straightforward integrations (after discarding 
non- leading terms and corrections): 

a 2 r(a + §, a a (a - I)) 

I(-cn a , -1) w ^ 2 ' 7 = n' 2a (5.9) 

2c 2 (a-l) a+ 2 J27ra(a-1) 



7(-l, 1) w - (CV73C 2 ) n- 2a (5.10) 
/(l,oo) w-(cr4/d») n 1 "", (5.11) 
where 2a = 2(a — l)/a, •) is the incomplete gamma function [55], and 

/"CO r] t> 

Jt= / — -[x-log(l + x)]<oo. (5.12) 

By comparing powers of n in (5.9) - (5.11) one sees that the leading behavior of the 
finite-date free energy estimate, not surprisingly, results from the "heavy" power-law tail 
(y — > oo). Thus, using (5.11), one has 



fn ~ f « <fa-l/n^ , 



(5.13) 



with (pa-i = [C^ 'P a / 'c a J > 0. Note that (p a -i depends on a and also on the original 
probability density p z through c = e~* jb\. Furthermore, one should not expect (5.13) to be 
a useful estimate for f n — /: the next leading exponent, 2(a — l)/a is very close to a — 1 
for a < 2. 



VI. UNIVERSAL ASYMPTOTIC FLUCTUATIONS 

The fluctuations in the finite-data free energy, f n = AF n /ksT, as measured by the 
variance a n of jF n of (2.6), are of considerable interest because of their potential to provide 
parameter-free extrapolative estimates of f^ = AF/hsT [13]; see also [57]. The variance is 
given by 

(M = ((t 1 ) 2 ) = L ' log ' 1 + ^ - e- - ^ ■ («) 

For n — > oo, it was pointed out in [14] that the simple, linear relation between f n — f and 
a\ was independent of the distribution of work values — that is, universal. Here, we sketch 
the derivation for the long-tailed case when the second Boltzmann moment diverges. 

To calculate the asymptotic behavior of the fluctuations (6.1) note first that second term 
(fn — f) 2 is necessarily of higher order than f n — f. For the crucial integral of (6.1), one 
finds 

roc i noo 

/ dy Pn (y)[\og(l + y/cn a )] 2 ~ ~~[~~ ^ ? (6.2) 

Il = fdu^[\og(l + u)] 2 . (6.3) 
Comparing (6.1) - (6.3) with (5.13) and (5.12), we see that as n — > oo 

This is a linear relation that depends only on a, via the ratio 1^/ but is otherwise in- 
dependent of the initital distribution of work values (or Boltzmann factors). In the limit 
a — > 2, the ratio 1^/1^ approaches 1/2, which is the finite-Boltzmann-moment result re- 
ported in [14]. Because numerical evaluation of the integral ratio is non-trivial we note that 
for a = 1.25, 1.5, 1.75, the corresponding values are I^/I* ^ 1.43, 0.81, 0.61. 

Figure 3 illustrates the universal behavior for a — 1.5. Two integrable distributions were 
selected to ensure reliable computations. The "simple" or regulated-power-law distribution 
is defined by p rp {z) = a! /(I + z) a +1 , with a' = a = 1.5. The "power" distribution is given 
by p p (z) = z /z a , with the choice z = 10 -4 . 



VII. SUMMARY AND DISCUSSION 



This report has expanded upon the brief discussion of Ref . [14] , giving a general statistical 
theory describing the systematic error present in free-energy-difference AF estimates based 
on a finite amount of data (N work values, W). As in [14], our focus has been on the 
large-iV asymptotic behavior, motivated by the need to improve extrapolation procedures 
first explored in [13]. However, beyond simply giving further details of the derivations 
of previous results, this report has made transparent the connection to general non-linear 
averages: the bounds of Sec. Ill, which generalize Jensen's inequality, explicitly apply to 
a broad class of nonlinear computations in addition to AF estimates; and, Sec. IV gives 
asymptotic expansions for geometric averages, such as the root-mean-square. 

The universal, asymptotic relation (6.4) between the expected value of the biased AF 
estimate based on iV work values (AF^) and the fluctuation in these estimtates (o"at) is one 
of the more striking results. We have shown here, in Sec. VI, that the relation is universal 
whether or not the second moment of the distribution of Boltzmann factors, exp (— IV/hsT), 
is finite — that is, whether or not the central limit theorem applies. If not, the stable (Levy) 
distributions come into play, and the relation between AFn and depends only on the 
exponent of the limiting stable distribution. 

We hope our results will have practical application in the extrapolation process outlined 
in [13], which suggested that dramatic increases in computational efficiency may be possible. 
In this context, examination of Pade approximants to the asymptotic series, which can be 
constructed to also exhibit suitable small- N behavior, may prove fruitful. We believe, finally, 
that the statistical foundation laid in Ref. [14] and here provides a basis for the crucial but 
non-trivial task of simply understanding convergence in estimates of free energy differences 
and other non-linear averages. 
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FIGURE CAPTIONS 



Figure 1. Finite-sampling errors in AF estimates based on (a) Gaussian-distributed work 
values and (b) work values generated in a molecular-mechanics solubility comparison between 
the fatty acids palmitate and stearate. The irregular, staircase-shaped plots are the running 
estimates based on N work values, while the smooth curves depict the average running 
estimates AFjy (2.5) which are independent of the order in which the work values were 
generated. The average running estimates are also rigorous upper bounds on AF. The 
standard deviation of the zero-mean Gaussian distribution in (a) is 4k B T, for which the 
true free energy difference is AF = AFoo = — 8k B T. For the fatty acid solvation case, 
AF = AFoo ^ 13 kcal/mole; note that 1 kcal/mole = 1.7 k B T. 

Figure 2. Finite-sampling error for Gaussian-distributed work values. The expected value of 
the dimensionless finite-sampling inaccuracy, (AF n — AF)/k B T for n data points is plotted 
as a function of 1/n. From top to bottom, the data sets represent numerical values of the 
error for Gaussian distributions of work values with standard deviations, a w /k B T of 3, 2, 1.5, 
and 1. The lines (dashed for a w /k B T = 1.5, solid for a w /k B T = 1) depict the asymptotic 
linear behavior for the two smallest widths. 



Figure 3. The universal n — > oo relation between AF„ — AF and its fluctuation a n for the 
long-tailed case when the Levy index is a = 1.5. The solid line depicts the universal slope 
la/ la — 0-815 for a = 1.5, as given in (6.4) and the succeeding text. The data for the 
"power" and "simple" distributions, described in the text, are each shown for a' — a — 1.5. 
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